How we resolve the slowness of integration tests in Serverless with the parallelism of CircleCI
Introduction
Integration tests have become more and more important in Serverless. The proposition already has been discussed in some places, so you might hear that. Briefly speaking, Serverless applications consist of many fully-managed services which cloud vendors (like AWS) provide so that it is hard or impossible to simulate the services in local environments. Certainly, LocalStack, SAM CLI and similar tools have been tackling the problem, but, at least for now, these tools do not have power enough to replace the service in actual environments because they lack some services and functionalities.
Hence, Integration tests are executed against actual environments in most cases because we have no choice but to do so. This is a rational strategy for the tests so far. But it also introduces another problems. One of the problems is the slowness of test feedback. Integration tests are much slower than unit tests due to its network access to test targets. We had difficulty in handling the problem. We should write tests to improve the quality of our software but it slows the execution time.
For making the problem clearer, I explain how we do integration tests in our project. Please look at the picture below:
This is the architecture we develop. We create an API mainly composed of Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. Actually, we use other AWS services for our project but I do not touch on those because they are not related on this post.
Anyway, the client is a mobile app. The app can communicate with backend systems through the API. The problem is the integration tests of the API. What the tests do are to access the API with HTTP Client and evaluate the result. The tests take about 13 minutes to complete. It is very slow.
In this post, I introduce how we resolve the problem with the feature of CircleCI.
Parallelism in CircleCI
To reduce the execution time, we decided to use the parallelism feature of CircleCI. This is the most concise and convenient way to execute tests in parallel I have ever known. We have been using pytest in all tests, so we can also leverage its plugins (e.g,. pytest-xdist, pytest-parallel and so on). But one of the advantages of CircleCI parallelism is its independence from any programming languages. We can use it with any languages. Considering the choice of another language, we decided to use it.
Anyway, lets get back on track. To make use of it, it is necessary to follow 2 steps. Let's take a closer look at these steps.
- Setting up parallelism in a job
- Distributing test cases by using CircleCI CLI
First, you need to set up parallelism
section in your job. This means how many containers you execute in parallel. You need to notice that the section has to be placed in job level. CircleCI does not support step level for parallelism currently. Therefore, you need to consider the granularity of job. For example, if there are many steps which do not need to run in parallel, this will decrease its benefit. If you want to get its benefit, you need to separate these steps from the job.
It is very easy to use. All you have to do is to specify parallelism
section in your job. For example, something like this:
version: 2 jobs: test: docker: - image: circleci/<language>:<version TAG> parallelism: 4
Next, CircleCI CLI. The tool is mainly used for debugging and validating CircleCI's config file ( config.yml
) on local environments. This tool also has tests
command and the command has 2 subcommands, glob
and split
. The glob
subcommand is used to discover test files and split
subcommand is used to distribute test files among containers.
I will try to explain in more details.
The glob
subcommand looks like test discovery which testing frameworks usually provide. It can handle wildcards to filter test files. You may use this for example:
$ circleci tests glob "tests/integration/test_*.py" tests/integration/test_foo.py tests/integration/test_bar.py tests/integration/test_baz.py tests/integration/test_blah.py ...
The split
subcommand determines how test files, which passed as an argument, are distributed among containers. A usual pattern is to pass the result of glob
subcommand into split
subcommand's stdin. Look at the example:
$ circleci tests glob "tests/integration/test_*.py" | circleci tests split
In this chapter, I introduced how to use parallelism in CircleCI. Next, I explain how we utilize the feature in our project to reduce the execution time.
Parallelism in Integration Tests
As mentioned above, we use pytest
for tests. We give its test files a prefix named test
and place them in tests/integration
directory. Therefore, you can create a CircleCI config file like the following:
build_and_test_unit: parallelism: 2 <<: *build_and_test_container steps: ... - run: name: Run integration tests command: | source aws-envs.sh source .venv/bin/activate python -m pytest $(circleci tests glob "tests/integration/test_*.py" | circleci tests split)
We just pass test files to pytest
as an argument and distribute these files among containers. Let me show you how the config file handles the job. Let's look at the picture below:
You may notice the number at the end of each steps. This is the index of containers. In this case, I specify two parallelism, therefore two containers invoke. Also see the execution time. We reduce the time to 5 minutes though it took more time before.
And lastly, please look at the picture below. You notice that the same step does different tasks. That is, split
subcommand distribute tasks among containers.
Conclusion
In this post, the feature and a real world example of parallelism in CircleCI are discussed. I think it can be adapted in many situations. I hope you find this post helpful.